Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications

نویسندگان

Shigeo Morishima

Satoshi Nakamura

چکیده

We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker’s speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker’s face in an image sequence. To retain the speaker’s facial expression, we substitute only the speech organ’s image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Face translation: A multimodal translation agent

In this paper, we present Face Translation, a translation agent for people who speak different languages. The system can not only translate a spoken utterance into another language, but also produce an audio-visual output with the speaker’s face and synchronized lip movement. The visual output is synthesized from real images based on image morphing technology. Both mouth and eye movements are g...

متن کامل

Chapter 16 JOINT AUDIO - VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR 1

In this chapter, we present our recent results on the multilevel Bayesian decision fusion scheme for multimodal audio-visual speaker identification problem. The objective is to improve the recognition performance over conventional decision fusion schemes. The proposed system decomposes the information existing in a video stream into three components: speech, lip trace and face texture. Lip trac...

متن کامل

AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition

This paper describes the acquisition of the multichannel multimodal database AV@CAR for automatic audio-visual speech recognition in cars. Automatic speech recognition (ASR) plays an important role inside vehicles to keep the driver away from distraction. It is also known that visual information (lip-reading) can improve accuracy in ASR under adverse conditions as those within a car. The corpus...

متن کامل

Performance Enhancement in Lip Synchronization Using MFCC Parameters

Many multimedia applications and entertainment industry products like games, cartoons and film dubbing require speech driven face animation and audio-video synchronization. Only Automatic Speech Recognition system (ASR) does not give good results in noisy environment. Audio Visual Speech Recognition system plays vital role in such harsh environment as it uses both – audio and visual – informati...

متن کامل

Face Translation

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

EURASIP J. Adv. Sig. Proc.

دوره 2004 شماره

صفحات -

تاریخ انتشار 2004

Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications

نویسندگان

چکیده

منابع مشابه

Face translation: A multimodal translation agent

Chapter 16 JOINT AUDIO - VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR 1

AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition

Performance Enhancement in Lip Synchronization Using MFCC Parameters

Face Translation

عنوان ژورنال:

اشتراک گذاری